Low-level Fusion of Audio and Video Feature for Multi-modal Emotion Recognition

نویسندگان

  • Matthias Wimmer
  • Björn Schuller
  • Dejan Arsic
  • Gerhard Rigoll
  • Bernd Radig
چکیده

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turnor chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect. However, early fusion is known to be more effective in many other multimodal recognition tasks. We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification. This strategy also allows for a combined feature-space optimization which will be discussed herein. The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turnor chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of under...

متن کامل

Multi-Modal Emotion Recognition Fusing Video and Audio

Emotion plays an important role in human communications. We construct a framework for multi-modal fusion emotion recognition. Facial expression features and speech features are respectively extracted from image sequences and speech signals. In order to locate and track facial feature points, we construct an Active Appearance Model for facial images with all kinds of expressions. Facial Animatio...

متن کامل

Fusion Framework for Emotional Electrocardiogram and Galvanic Skin Response Recognition: Applying Wavelet Transform

Introduction To extract and combine information from different modalities, fusion techniques are commonly applied to promote system performance. In this study, we aimed to examine the effectiveness of fusion techniques in emotion recognition. Materials and Methods Electrocardiogram (ECG) and galvanic skin responses (GSR) of 11 healthy female students (mean age: 22.73±1.68 years) were collected ...

متن کامل

Visual and Audio Aware Bi-Modal Video Emotion Recognition

With rapid increase in the size of videos online, analysis and prediction of affective impact that video content will have on viewers has attracted much attention in the community. To solve this challenge several different kinds of information about video clips are exploited. Traditional methods normally focused on single modality, either audio or visual. Later on some researchers tried to esta...

متن کامل

Spatiotemporal Networks for Video Emotion Recognition

Our article presents an audio-visual based multi-modal emotion classification system. Considering the fact of deep learning approaches to facial analysis have recently demonstrated high performance, in our work, we use convolutional neural networks (CNNs) for emotion recognition in video, relying on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007